Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selective sync #834

Merged
merged 2 commits into from
Aug 4, 2017
Merged

Selective sync #834

merged 2 commits into from
Aug 4, 2017

Conversation

okdistribute
Copy link
Collaborator

@okdistribute okdistribute commented Jul 29, 2017

Adds selective sync. This is a workflow requested by the Sloan Digital Sky Survey researchers at Berkeley Lawrence Lab.

New features added:

  1. dat clone <key> --empty : will create a sparse archive and won't download any files by default.
  2. dat sync <dir> --select=<remote-file>,<remote-file>,<remote-file>: will download remote files with the selected names
    OR
  3. dat sync <dir> --select-from-file=<local-file-list.txt> will parse the file and download all files listed (separated by newlines). Default is .datdownload

For example, check out this short video for a demo: https://www.dropbox.com/s/ih0pu1wqt2c1h9b/selective-sync-demo.mov?dl=0

Things this PR doesn't do yet:

  • Exit when done downloading the selected files.
  • UI for progress

@joehand
Copy link
Collaborator

joehand commented Aug 2, 2017

Sweet!

This looks good. The only concern I have is that this is stateless. If you run:

dat sync --select=my-file.txt

Every time you want to update my-file then you have to make sure to include it again. If you run dat sync accidentally or without specifying the file:

dat sync

Then it'll download everything.

In my mind this was a blocker before with the selective sync. Based on your conversation with SDSS folks, do you think we should be worried about this?

We could make it stateful by adding a .datdownload or similar file to track what files to keep in sync.

@ralphtheninja
Copy link
Contributor

ralphtheninja commented Aug 2, 2017

@Karissa What if you want to sync a remote file named file.txt compared to only syncing file names in a local file called file.txt? How would you distinguish between the two? I know it's a silly example :)


try {
if (fs.statSync(input).isFile()) {
parsed = fs.readFileSync(input).toString().trim().split('\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this mean problems for files on windows ending with \r\n?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it to .split(/\r?\n/) would prob work

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point!

@ralphtheninja
Copy link
Contributor

ralphtheninja commented Aug 2, 2017

I would split the functionality into two different arguments, e.g. --select and --select-from-file. This also solves the problem of ambiguity.

@mafintosh
Copy link
Contributor

Want this feature so bad :) I think we should use Beaker style filters dat://{key}/{folder}/{file}

debug('archive update')
bus.emit('render')
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna steal most of this and put in hyperdrive later

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@mafintosh
Copy link
Contributor

In general this looks solid 👍 👍 👍

@okdistribute
Copy link
Collaborator Author

okdistribute commented Aug 2, 2017

@ralphtheninja yeah i think you bring up a good point, but I agree with @joehand that perhaps having some state file (like .datdownload) would make even more sense, so that later dat sync commands would respect the same parameters.

@okdistribute
Copy link
Collaborator Author

okdistribute commented Aug 2, 2017

OK I made some changes so now:

  • select-from-file allows you to provide a file. It defaults to '.datdownload' so that multiple dat sync commands will follow the same parameters.
  • select now only accepts a comma-separated list of files to sync.
  • Windows-style newlines will work.

TODO in a different PR:

  • Support dat://<key>/dir/file.txt links

Copy link
Collaborator

@joehand joehand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.datdownload file stuff looks good. Last questions:

  • Should we read/check that file for the regular dat sync and dat pull commands, so we avoid accidental downloads? EDIT: misread the code and this works for dat sync already ✨
  • Do we want to write to .datdownload when you pass --select=file.txt,etc.?

package.json Outdated
@@ -1,6 +1,6 @@
{
"name": "dat",
"version": "13.7.0",
"version": "13.8.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna try to get #828 released for 13.8 too, can we increment version separate from the PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -8,6 +8,11 @@ module.exports = {
].join('\n'),
options: [
{
name: 'empty',
default: false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add boolean setting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

archive.readdir(dirname, function (err, entries) {
if (err) return bus.emit('exit:error', err)
entries.forEach(function (entry) {
emitter.emit('download', path.join(dirname, entry))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the benefit here of using EventEmitter instead of calling download(entry)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was we could also figure out when it was done and call emitter.emit('end') .. right now it just stays open and you have to watch the download speed

function download (entry) {
debug('selected', entry)
archive.stat(entry, function (err, stat) {
if (err) return bus.emit('exit:error', err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be nice if we can warn without exiting if they select a file that doesn't exist. But that may be a harder UI problem we can solve later.

name: 'selectFromFile',
boolean: false,
default: false,
help: 'Sync only the list of selected files or directories in the given file. (default: .datdownload)',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I use the default? I thought dat sync --selectFromFile would use .datdownload by default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then tried this:

❯ dat sync --select        
/Users/joe/code/node_modules/dat/src/commands/sync.js:80
    if (opts.select) opts.selectedFiles = opts.select.split(',')
                                                      ^

TypeError: opts.select.split is not a function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you just use dat sync and it'll use .datdownload by default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol @ me trying to be too fancy 🤦‍♂️

@joehand
Copy link
Collaborator

joehand commented Aug 3, 2017

It'd be cool to put a message after dat clone --empty on how to use it:

❯ dat clone datproject.org/rmg/librariesio-csv . --empty          
Dat successfully created in empty mode. Download files using pull or sync.

@@ -14,6 +15,7 @@ module.exports = function (state, bus) {
if (state.opts.http) serve(state, bus)

if (state.writable && state.opts.import) doImport(state, bus)
else if (state.opts.sparse) selectiveSync(state, bus)
else download(state, bus)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be cool to have a download progress UI =). I think we can update the dat-node/stats to work for this but that can wait.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, maybe lets open a PR on that

@okdistribute
Copy link
Collaborator Author

@joehand It already gives that message on clone. Or what kind of message were you thinking?

@joehand
Copy link
Collaborator

joehand commented Aug 3, 2017

@joehand It already gives that message on clone. Or what kind of message were you thinking?

Ya just some more instruction on how to use the selective stuff:

❯ dat clone datproject.org/rmg/librariesio-csv . --empty          
Dat successfully created in empty mode. Select which files to download with dat sync:

dat sync --select=readme.md,my-data.csv

Or create a /.datdownload file with your file list:

readme.md
my-data.csv

And run `dat sync`.

Part of the issue was that dat pull doesn't work with selective so that message is a bit misleading right now.

@@ -21,6 +21,19 @@ module.exports = {
abbr: 'ignore-hidden'
},
{
name: 'selectFromFile',
boolean: false,
default: false,
Copy link
Collaborator

@joehand joehand Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the default here? The --help prints out the default, so this was why I got confused:

dat sync --help
...
    --selectFromFile, -select-from-file
                          Sync only the list of selected files or directories in the given file. (default: .datdownload) (default: false)
...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i didn't realize that. yeah sure

@okdistribute
Copy link
Collaborator Author

Now the UI looks like this when trying to download something that doesn't exist:
screen shot 2017-08-03 at 1 31 51 pm

Copy link
Collaborator

@joehand joehand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okdistribute okdistribute merged commit 9a7150a into master Aug 4, 2017
@okdistribute okdistribute deleted the selective-sync branch August 4, 2017 00:05
@freedba
Copy link

freedba commented Mar 5, 2019

selective UI for progress ,when complete ?
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants